OpenAI has added two advanced security measures to ChatGPT to address the risk of prompt injection attacks. The new measures are built on the existing security framework, including a sandbox mechanism and URL data leakage protection. The first measure is an optional lockdown mode for users with high security needs, aimed at preventing third parties from deceiving the AI into executing malicious commands or leaking sensitive information.
To enhance the security of ChatGPT Atlas browser, OpenAI has launched a 'using poison to fight poison' strategy, using an automated attacker system to simulate hacker methods for round-the-clock stress testing, focusing on preventing adversarial prompt injection attacks to prevent malicious commands from controlling the AI agent.
Perplexity has launched the BrowseSafe system, designed to protect AI browser proxies from being manipulated by online content in real time. The system claims a 91% success rate in detecting prompt injection attacks, which is higher than GPT-5's 85% and PromptGuard-2's 35%. Additionally, it runs quickly and can monitor in real time. As AI browser proxies become more widespread, such security solutions are becoming increasingly important.
Comet browser by Perplexity has a security flaw allowing unverified input, leading to indirect prompt injection attacks, as reported by Brave's security team.....
codeintegrity-ai
ModernBERT PromptGuard is a high-performance binary classifier specifically designed to detect malicious prompts in large language model applications, including prompt injection and jailbreak attacks.
meta-llama
Llama Prompt Guard 2 is a series of prompt attack detection models launched by Meta, including an upgraded 86M-parameter version and a lightweight 22M-parameter version, designed to detect prompt injection and jailbreak attacks in large language model applications.
Llama Prompt Guard 2 86M is a text classification model designed to detect prompt injection and jailbreak attacks, serving as the second-generation product in the Prompt Guard series.
leolee99
PIGuard is a new type of prompt protection model specifically designed to detect prompt injection attacks. Through an innovative training strategy, it significantly reduces the bias towards trigger words, performs excellently in multiple benchmark tests, surpassing the current best model by 30.8%, and provides a powerful open-source protection solution for LLM security.
InjecGuard is a protective model against prompt injection attacks for large language models (LLMs), capable of effectively identifying and defending against malicious instructions while reducing over-defense issues.
proventra
A prompt injection detection model fine-tuned based on microsoft/mdeberta-v3-base, trained with multiple datasets to identify malicious prompt injection attacks.
dcarpintero
A lightweight model based on ModernBERT, focused on identifying malicious prompt injection attacks and providing AI security protection.
A lightweight model based on ModernBERT (large model edition), specifically designed to identify malicious prompts (i.e., prompt injection attacks).
skshreyas714
Prompt Guard is a text classification model designed to detect prompt attacks, capable of identifying malicious prompt injections and jailbreak attempts.
testsavantai
TestSavantAI models are a set of fine-tuned classifiers specifically designed to defend against prompt injection and jailbreak attacks targeting large language models (LLMs).
The TestSavantAI model is a set of classifiers specifically designed to defend against prompt injection and jailbreak attacks in large language models (LLMs). The tiny version is based on the BERT-tiny architecture, balancing security and computational efficiency.
GenTelLab
GenTel-Shield is a model focused on detecting and defending against prompt injection attacks, effectively distinguishing malicious samples from benign ones.
PromptGuard is a text classification model designed to detect and protect against LLM prompt attacks, capable of identifying malicious prompt injections and jailbreak attempts.
protectai
This is the ONNX format conversion of the fmops/distilbert-prompt-injection model, designed for detecting prompt injection attacks.
This is the ONNX-converted version of the deepset/deberta-v3-base-injection model for detecting prompt injection attacks.
fmops
Dataset for detecting and preventing prompt injection attacks, supporting multilingual text analysis
AI package security scanning tool, offering two modes: CLI and MCP server. It can quickly detect vulnerabilities, prompt injection, and supply chain attacks in MCP servers, AI skills, and software packages.
AI Coding Assistant Security Scanner, scans code vulnerabilities, detects AI hallucination packages, and prevents prompt injection attacks through MCP or CLI, supports 12 languages and more than 1,700 security rules